Goto

Collaborating Authors

 vote share


Learning from Convenience Samples: A Case Study on Fine-Tuning LLMs for Survey Non-response in the German Longitudinal Election Study

Holtdirk, Tobias, Assenmacher, Dennis, Bleier, Arnim, Wagner, Claudia

arXiv.org Artificial Intelligence

Survey researchers face two key challenges: the rising costs of probability samples and missing data (e.g., non-response or attrition), which can undermine inference and increase the use of convenience samples. Recent work explores using large language models (LLMs) to simulate respondents via persona-based prompts, often without labeled data. We study a more practical setting where partial survey responses exist: we fine-tune LLMs on available data to impute self-reported vote choice under both random and systematic nonresponse, using the German Longitudinal Election Study. We compare zero-shot prompting and supervised fine-tuning against tabular classifiers (e.g., CatBoost) and test how different convenience samples (e.g., students) used for fine-tuning affect generalization. Our results show that when data are missing completely at random, fine-tuned LLMs match tabular classifiers but outperform zero-shot approaches. When only biased convenience samples are available, fine-tuning small (3B to 8B) open-source LLMs can recover both individual-level predictions and population-level distributions more accurately than zero-shot and often better than tabular methods. This suggests fine-tuned LLMs offer a promising strategy for researchers working with non-probability samples or systematic missing-ness, and may enable new survey designs requiring only easily accessible subpopulations.


Recommender Systems for Democracy: Toward Adversarial Robustness in Voting Advice Applications

Berdoz, Frédéric, Brunner, Dustin, Vonlanthen, Yann, Wattenhofer, Roger

arXiv.org Artificial Intelligence

V oting advice applications (V AAs) help millions of voters understand which political parties or candidates best align with their views. This paper explores the potential risks these applications pose to the democratic process when targeted by adversarial entities. In particular, we expose 11 manipulation strategies and measure their impact using data from Switzerland's primary V AA, Smartvote, collected during the last two national elections. We find that altering application parameters, such as the matching method, can shift a party's recommendation frequency by up to 105%. Cherry-picking questionnaire items can increase party recommendation frequency by over 261%, while subtle changes to parties' or candidates' responses can lead to a 248% increase. To address these vulnerabilities, we propose adversarial robustness properties V AAs should satisfy, introduce empirical metrics for assessing the resilience of various matching methods, and suggest possible avenues for research toward mitigating the effect of manipulation. Our framework is key to ensuring secure and reliable AI-based V AAs poised to emerge in the near future.


Exclusion Zones of Instant Runoff Voting

Tomlinson, Kiran, Ugander, Johan, Kleinberg, Jon

arXiv.org Artificial Intelligence

Recent research on instant runoff voting (IRV) shows that it exhibits a striking combinatorial property in one-dimensional preference spaces: there is an "exclusion zone" around the median voter such that if a candidate from the exclusion zone is on the ballot, then the winner must come from the exclusion zone. Thus, in one dimension, IRV cannot elect an extreme candidate as long as a sufficiently moderate candidate is running. In this work, we examine the mathematical structure of exclusion zones as a broad phenomenon in more general preference spaces. We prove that with voters uniformly distributed over any $d$-dimensional hyperrectangle (for $d > 1$), IRV has no nontrivial exclusion zone. However, we also show that IRV exclusion zones are not solely a one-dimensional phenomenon. For irregular higher-dimensional preference spaces with fewer symmetries than hyperrectangles, IRV can exhibit nontrivial exclusion zones. As a further exploration, we study IRV exclusion zones in graph voting, where nodes represent voters who prefer candidates closer to them in the graph. Here, we show that IRV exclusion zones present a surprising computational challenge: even checking whether a given set of positions is an IRV exclusion zone is NP-hard. We develop an efficient randomized approximation algorithm for checking and finding exclusion zones. We also report on computational experiments with exclusion zones in two directions: (i) applying our approximation algorithm to a collection of real-world school friendship networks, we find that about 60% of these networks have probable nontrivial IRV exclusion zones; and (ii) performing an exhaustive computer search of small graphs and trees, we also find nontrivial IRV exclusion zones in most graphs. While our focus is on IRV, the properties of exclusion zones we establish provide a novel method for analyzing voting systems in metric spaces more generally.


LLM Generated Distribution-Based Prediction of US Electoral Results, Part I

Bradshaw, Caleb, Miller, Caelen, Warnick, Sean

arXiv.org Artificial Intelligence

This paper introduces distribution-based prediction, a novel approach to using Large Language Models (LLMs) as predictive tools by interpreting output token probabilities as distributions representing the models' learned representation of the world. This distribution-based nature offers an alternative perspective for analyzing algorithmic fidelity, complementing the approach used in silicon sampling. We demonstrate the use of distribution-based prediction in the context of recent United States presidential election, showing that this method can be used to determine task specific bias, prompt noise, and algorithmic fidelity. This approach has significant implications for assessing the reliability and increasing transparency of LLM-based predictions across various domains.


United in Diversity? Contextual Biases in LLM-Based Predictions of the 2024 European Parliament Elections

von der Heyde, Leah, Haensch, Anna-Carolina, Wenz, Alexander

arXiv.org Artificial Intelligence

Large language models (LLMs) are perceived by some as having the potential to revolutionize social science research, considering their training data includes information on human attitudes and behavior. If these attitudes are reflected in LLM output, LLM-generated "synthetic samples" could be used as a viable and efficient alternative to surveys of real humans. However, LLM-synthetic samples might exhibit coverage bias due to training data and fine-tuning processes being unrepresentative of diverse linguistic, social, political, and digital contexts. In this study, we examine to what extent LLM-based predictions of public opinion exhibit context-dependent biases by predicting voting behavior in the 2024 European Parliament elections using a state-of-the-art LLM. We prompt GPT-4-Turbo with anonymized individual-level background information, varying prompt content and language, ask the LLM to predict each person's voting behavior, and compare the weighted aggregates to the real election results. Our findings emphasize the limited applicability of LLM-synthetic samples to public opinion prediction. We show that (1) the LLM-based prediction of future voting behavior largely fails, (2) prediction accuracy is unequally distributed across national and linguistic contexts, and (3) improving LLM predictions requires detailed attitudinal information about individuals for prompting. In investigating the contextual differences of LLM-based predictions of public opinion, our research contributes to the understanding and mitigation of biases and inequalities in the development of LLMs and their applications in computational social science.


The Moderating Effect of Instant Runoff Voting

Tomlinson, Kiran, Ugander, Johan, Kleinberg, Jon

arXiv.org Artificial Intelligence

Instant runoff voting (IRV) has recently gained popularity as an alternative to plurality voting for political elections, with advocates claiming a range of advantages, including that it produces more moderate winners than plurality and could thus help address polarization. However, there is little theoretical backing for this claim, with existing evidence focused on case studies and simulations. In this work, we prove that IRV has a moderating effect relative to plurality voting in a precise sense, developed in a 1-dimensional Euclidean model of voter preferences. We develop a theory of exclusion zones, derived from properties of the voter distribution, which serve to show how moderate and extreme candidates interact during IRV vote tabulation. The theory allows us to prove that if voters are symmetrically distributed and not too concentrated at the extremes, IRV cannot elect an extreme candidate over a moderate. In contrast, we show plurality can and validate our results computationally. Our methods provide new frameworks for the analysis of voting systems, deriving exact winner distributions geometrically and establishing a connection between plurality voting and stick-breaking processes.


Evaluating District-based Election Surveys with Synthetic Dirichlet Likelihood

Mitra, Adway, Dey, Palash

arXiv.org Artificial Intelligence

In district-based multi-party elections, electors cast votes in their respective districts. In each district, the party with maximum votes wins the corresponding seat in the governing body. Election Surveys try to predict the election outcome (vote shares and seat shares of parties) by querying a random sample of electors. However, the survey results are often inconsistent with the actual results, which could be due to multiple reasons. The aim of this work is to estimate a posterior distribution over the possible outcomes of the election, given one or more survey results. This is achieved using a prior distribution over vote shares, election models to simulate the complete election from the vote share, and survey models to simulate survey results from a complete election. The desired posterior distribution over the space of possible outcomes is constructed using Synthetic Dirichlet Likelihoods, whose parameters are estimated from Monte Carlo sampling of elections using the election models. We further show the same approach can also use be used to evaluate the surveys - whether they were biased or not, based on the true outcome once it is known. Our work offers the first-ever probabilistic model to analyze district-based election surveys. We illustrate our approach with extensive experiments on real and simulated data of district-based political elections in India.


Design and analysis of tweet-based election models for the 2021 Mexican legislative election

Vigna-Gómez, Alejandro, Murillo, Javier, Ramirez, Manelik, Borbolla, Alberto, Márquez, Ian, Ray, Prasun K.

arXiv.org Artificial Intelligence

Modelling and forecasting real-life human behaviour using online social media is an active endeavour of interest in politics, government, academia, and industry. Since its creation in 2006, Twitter has been proposed as a potential laboratory that could be used to gauge and predict social behaviour. During the last decade, the user base of Twitter has been growing and becoming more representative of the general population. Here we analyse this user base in the context of the 2021 Mexican Legislative Election. To do so, we use a dataset of 15 million election-related tweets in the six months preceding election day. We explore different election models that assign political preference to either the ruling parties or the opposition. We find that models using data with geographical attributes determine the results of the election with better precision and accuracy than conventional polling methods. These results demonstrate that analysis of public online data can outperform conventional polling methods, and that political analysis and general forecasting would likely benefit from incorporating such data in the immediate future. Moreover, the same Twitter dataset with geographical attributes is positively correlated with results from official census data on population and internet usage in Mexico. These findings suggest that we have reached a period in time when online activity, appropriately curated, can provide an accurate representation of offline behaviour.


Prediction of the 2023 Turkish Presidential Election Results Using Social Media Data

Bozanta, Aysun, Bayrak, Fuad, Basar, Ayse

arXiv.org Artificial Intelligence

Social media platforms influence the way political campaigns are run and therefore they have become an increasingly important tool for politicians to directly interact with citizens. Previous elections in various countries have shown that social media data may significantly impact election results. In this study, we aim to predict the vote shares of parties participating in the 2023 elections in Turkey by combining social media data from various platforms together with traditional polling data. Our approach is a volume-based approach that considers the number of social media interactions rather than content. We compare several prediction models across varying time windows. Our results show that for all time windows, the ARIMAX model outperforms the other algorithms.


Selecting Representative Bodies: An Axiomatic View

Revel, Manon, Boehmer, Niclas, Colley, Rachael, Brill, Markus, Faliszewski, Piotr, Elkind, Edith

arXiv.org Artificial Intelligence

As the world's democratic institutions are challenged by dissatisfied citizens, political scientists and also computer scientists have proposed and analyzed various (innovative) methods to select representative bodies, a crucial task in every democracy. However, a unified framework to analyze and compare different selection mechanisms is missing, resulting in very few comparative works. To address this gap, we advocate employing concepts and tools from computational social choice in order to devise a model in which different selection mechanisms can be formalized. Such a model would allow for desirable representation axioms to be conceptualized and evaluated. We make the first step in this direction by proposing a unifying mathematical formulation of different selection mechanisms as well as various social-choice-inspired axioms such as proportionality and monotonicity.